Towards Spam Mail Detection using Robust Feature Evaluated with Feature Selection Techniques
نویسندگان
چکیده
Filtering of spam emails is a significant operation in email system. The efficiency of this process is determined by many factors such as number of features, representation of samples, classifier etc. This study covers all these factors and aims to find the optimal settings for email spam filtering. Twelve feature selection methods extensively used in text categorization are implemented to synthesize prominent attributes from different categories (i.e. header, subject and body of the mails). Optimal classification performances are obtained for Weighted Mutual Information and Log-TFIDF-Cosine(LTC) feature selection methods for header and body features of the mail with Random Forest and Support Vector Machine classifiers respectively. An overall F1-measure of 0.978 with 0.44s prediction time is achieved when 20% of the original feature length is considered. Keyword-Dimensionality Reduction, Feature Selection, Spam Filtering, Classifier
منابع مشابه
A Classification Method for E-mail Spam Using a Hybrid Approach for Feature Selection Optimization
Spam is an unwanted email that is harmful to communications around the world. Spam leads to a growing problem in a personal email, so it would be essential to detect it. Machine learning is very useful to solve this problem as it shows good results in order to learn all the requisite patterns for classification due to its adaptive existence. Nonetheless, in spam detection, there are a large num...
متن کاملCost-Sensitive Spam Detection Using Parameters Optimization and Feature Selection
E-mail spam is no more garbage but risk since it recently includes virus attachments and spyware agents which make the recipients’ system ruined, therefore, there is an emerging need for spam detection. Many spam detection techniques based on machine learning techniques have been proposed. As the amount of spam has been increased tremendously using bulk mailing tools, spam detection techniques ...
متن کاملA Novel Hybrid Approach for Email Spam Detection based on Scatter Search Algorithm and K-Nearest Neighbors
Because cyberspace and Internet predominate in the life of users, in addition to business opportunities and time reductions, threats like information theft, penetration into systems, etc. are included in the field of hardware and software. Security is the top priority to prevent a cyber-attack that users should initially be detecting the type of attacks because virtual environments are not moni...
متن کاملLearning Spam: Simple Techniques For Freely-Available Software
The problem of automatically filtering out spam e-mail using a classifier based on machine learning methods is of great recent interest. This paper gives an introduction to machine learning methods for spam filtering, reviewing some of the relevant ideas and work in the open source community. An overview of several feature detection and machine learning techniques for spam filtering is given. T...
متن کاملA Comparison of Event Models for Naive Bayes Anti-Spam E-Mail Filtering
We describe experiments with a Naive Bayes text classifier in the context of anti-spam E-mail filtering, using two different statistical event models: a multi-variate Bernoulli model and a multinomial model. We introduce a family of feature ranking functions for feature selection in the multinomial event model that take account of the word frequency information. We present evaluation results on...
متن کامل